Search CORE

63 research outputs found

From Unstructured Data to Narrative Abstractive Summaries

Author: Saquete Boró Estela
Publication venue: Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik
Publication date: 01/01/2019
Field of study

To provide with easy and optimal access to digital information, narrative summaries must have a coherent and natural structure. Depending on how a summary is produced, a distinction can be made between extractive and abstractive summaries. Using an abstractive summarization approach, the relevant information (e.g., who? what?, when?, where?,...) could be fused together, leading to the generation of one or more new sentences. However, in order to do this it is necessary to obtain and process the temporal information in a text. A very effective way is the generation of timelines starting from multiple documents so that the generation of summaries is supported by the generated timeline, without losing the relevant temporal information of the texts. In this proposal, a enriched timeline is generated automatically, and the process of generating abstractive summaries is presented using this timeline as a basis [Barros et al., 2019]. Finally, potential applications of the automatic timeline generation would be presented, as for example its application to Fake News detection.This research work has been partially funded by the projects PROMETEU/2018/089 and RTI2018-094653-B-C22

Repositorio Institucional de la Universidad de Alicante

Ordenación de eventos multidocumento usando inferencia de relaciones temporales y modelos semánticos distribucionales

Author: Navarro Colorado Borja
Saquete Boró Estela
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2017
Field of study

This paper focuses on the contribution of temporal relations inference and distributional semantic models to the event ordering task. Our system automatically builds ordered timelines of events from different written texts in English by performing first temporal clustering and then semantic clustering. In order to determine temporal compatibility, an inference from the temporal relationships between events –automatically extracted from a Temporal Information Processing system– is applied. Regarding semantic compatibility between events, we analyze two different distributional semantic models: LDA Topic modeling and Word2Vec word embeddings. Both semantic models together with the temporal inference have been evaluated within the framework of SemEval 2015 Task 4 Track B. Experiments show that, using both models, the current State of the Art is improved, showing significant advance in the Cross-Document Event Ordering task.Este artículo se centra en estudiar la contribución que la inferencia de relaciones temporales y los modelos semánticos distribucionales hacen a la tarea de ordenación de eventos. Nuestro sistema construye automáticamente líneas de tiempo con eventos extraídos de diferentes documentos escritos en inglés. Para ello realiza primero una agrupación temporal y posteriormente una agrupación semántica. Para determinar la compatibilidad temporal se realiza una inferencia sobre las relaciones temporales entre los eventos extraídos de un sistema automático de procesamiento de información temporal. Para la compatibilidad semántica entre eventos hemos analizado dos modelos semánticos distribucionales distintos: LDA Topic Modeling y Word2Vec Word Embeddings. Ambos modelos semánticos junto con la inferencia temporal han sido evaluados bajo el marco de evaluación de SemEval 2015 Task 4 Track B. Los experimentos muestran que, usando ambos modelos se mejora el estado del arte actual, implicando un avance importante en la tarea de ordenación de eventos multidocumento.This paper has been partially supported by the Spanish government, project TIN2015-65100-R, project TIN2015-65136-C2-2-R and PROMETEOII/2014/001

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Event ordering through temporal expression resolution

Author: Martínez-Barco Patricio
Muñoz Rafael
Saquete Boró Estela
Publication venue: RANLP
Publication date: 01/01/2003
Field of study

In this paper a multilingual method for event ordering based on temporal expression resolution is presented. This method has been implemented through the TERSEO system which consists of three main units: temporal expression recognizing, resolution of the coreference introduced by these expressions, and event ordering. By means of this system, chronological information related to events can be extracted from documental databases. This information is automatically added to the documental database in order to allow its use by question answering systems in those cases referring to temporality. The system has been evaluated obtaining results of 91 % precision and 71 % recall. For this, a blind evaluation process has been developed guaranteing a reliable annotation process that was measured through the kappa factor.This paper has been supported by the Spanish government, projects FIT-150500-2002-244 and FIT-150500-2002-416

Repositorio Institucional de la Universidad de Alicante

Detecting Misleading Headlines Through the Automatic Recognition of Contradiction in Spanish

Author: Bonet-Jover Alba
Saquete Boró Estela
Sepúlveda-Torres Robiert
Publication venue: IEEE
Publication date: 14/07/2023
Field of study

Misleading headlines are part of the disinformation problem. Headlines should give a concise summary of the news story helping the reader to decide whether to read the body text of the article, which is why headline accuracy is a crucial element of a news story. This work focuses on detecting misleading headlines through the automatic identification of contradiction between the headline and body text of a news item. When the contradiction is detected, the reader is alerted to the lack of precision or trustworthiness of the headline in relation to the body text. To facilitate the automatic detection of misleading headlines, a new Spanish dataset is created (ES_Headline_Contradiction) for the purpose of identifying contradictory information between a headline and its body text. This dataset annotates the semantic relationship between headlines and body text by categorising the relation between texts as compatible , contradictory and unrelated . Furthermore, another novel aspect of this dataset is that it distinguishes between different types of contradictions, thereby enabling a more fine-grain identification of them. The dataset was built via a novel semi-automatic methodology, which resulted in a more cost-efficient development process. The results of the experiments show that pre-trained language models can be fine-tuned with this dataset, producing very encouraging results for detecting incongruency or non-relation between headline and body text.This research work is funded by MCIN/AEI/ 10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). Also funded by Generalitat Valenciana through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/2021/21), and the grant ACIF/2020/177

Repositorio Institucional de la Universidad de Alicante

Textual Information Extraction

Author: Martínez-Barco Patricio
Saquete Boró Estela
Publication venue
Publication date: 15/07/2014
Field of study

Material completo EI

Repositorio Institucional de la Universidad de Alicante

Cross-document event ordering through temporal, lexical and distributional knowledge

Author: Navarro Colorado Borja
Saquete Boró Estela
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

In this paper we present a system that automatically builds ordered timelines of events from different written texts in English. The system deals with problems such as automatic event extraction, cross-document temporal relation extraction and cross-document event coreference resolution. Its main characteristic is the application of three different types of knowledge: temporal knowledge, lexical-semantic knowledge and distributional-semantic knowledge, in order to anchor and order the events in the timeline. It has been evaluated within the framework of SemEval 2015. The proposed system improves the current state-of-the-art systems in all measures (up to eight points of F1-score over other systems) and shows a significant advance in the Cross-document event ordering task.This paper has been partially supported by the Spanish government, project TIN2015-65100-R and project TIN2015-65136-C2-2-R

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

NATSUM: Narrative abstractive summarization through cross-document timeline generation

Author: Barros Cristina
Lloret Elena
Navarro Colorado Borja
Saquete Boró Estela
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

A new approach to narrative abstractive summarization (NATSUM) is presented in this paper. NATSUM is centered on generating a narrative chronologically ordered summary about a target entity from several news documents related to the same topic. To achieve this, first, our system creates a cross-document timeline where a time point contains all the event mentions that refer to the same event. This timeline is enriched with all the arguments of the events that are extracted from different documents. Secondly, using natural language generation techniques, one sentence for each event is produced using the arguments involved in the event. Specifically, a hybrid surface realization approach is used, based on over-generation and ranking techniques. The evaluation demonstrates that NATSUM performed better than extractive summarization approaches and competitive abstractive baselines, improving the F1-measure at least by 50%, when a real scenario is simulated.This research work has been partially funded by the Ministerio de Economía y Competitividad. España through projects TIN2015-65100-R, TIN2015-65136-C2-2-R, as well as by the project “Analisis de Sentimientos Aplicado a la Prevencion del Suicidio en las Redes Sociales (ASAP)” funded by Ayudas Fundación BBVA a equipos de investigacion cientifica. Moreover, it has been also funded by Generalitat Valenciana through project “SIIA: Tecnologías del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” with grant reference PROMETEU/2018/089

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A semi-automatic annotation methodology that combines Summarization and Human-In-The-Loop to create disinformation detection resources

Author: Bonet-Jover Alba
Martínez-Barco Patricio
Saquete Boró Estela
Sepúlveda-Torres Robiert
Publication venue: Elsevier
Publication date: 16/06/2023
Field of study

Early detection of disinformation is one of the most challenging big-scale problems facing present day society. This is why the application of technologies such as Artificial Intelligence and Natural Language Processing is necessary. The vast majority of Artificial Intelligence approaches require annotated data, and generating these resources is very expensive. This proposal aims to improve the efficiency of the annotation process with a two-level semi-automatic annotation methodology. The first level extracts relevant information through summarization techniques. The second applies a Human-in-the-Loop strategy whereby the labels are pre-annotated by the machine, corrected by the human and reused by the machine to retrain the automatic annotator. After evaluating the system, the average annotation time per news item is reduced by 50%. In addition, a set of experiments on the semi-automatically annotated dataset that is generated are performed so as to demonstrate the effectiveness of the proposal. Although the dataset is annotated in terms of unreliable content, it is applied to the veracity detection task with very promising results (0.95 accuracy in reliability detection and 0.78 in veracity detection).This research work is funded by MCIN/AEI/ 10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). Also funded by Generalitat Valenciana through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/ 2021/21), and the grant ACIF/2020/177

Repositorio Institucional de la Universidad de Alicante

ELAINE: rELiAbility and evIdence-aware News vErifier

Author: Badenes-Olmedo Carlos
Bonet-Jover Alba
Saquete Boró Estela
Sepúlveda-Torres Robiert
Publication venue: CEUR
Publication date: 26/10/2023
Field of study

Disinformation is one of the main problems of today’s society, and specifically the viralization of fake news. This research presents ELAINE, a hybrid proposal to detect the veracity of news items that combines content reliability information with external evidence. The external evidence is extracted from a scientific knowledge base that contains medical information associated with coronavirus, organized in a knowledge graph created from a CORD-19 corpus. The information is accessed using Natural Language Question Answering and a set of evidences are extracted and their relevance measured. By combining both reliability and evidence information, the veracity of the news items can be predicted, improving both accuracy and F1 compared with using only reliability information. These results prove that the approach presented is very promising for the veracity detection task

Repositorio Institucional de la Universidad de Alicante

Team GPLSI at AuTexTification Shared Task: Determining the Authorship of a Text

Author: Lloret Elena
Martínez-Murillo Iván
Palomar Manuel
Saquete Boró Estela
Sepúlveda-Torres Robiert
Publication venue: CEUR
Publication date: 26/09/2023
Field of study

AuTexTification is a shared task within the IberLEF workshop which aims to determine whether a text has been generated by an Artificial Intelligence (AI) or a human. The objective of this paper is to report the participation and results of the GPLSI team from the University of Alicante (Spain) in subtask 1: Human or Generated of the AuTexTification challenge for English and Spanish languages. We propose and experiment with different approaches based on Transfer Learning; Ensemble Learning; fine-tuning existing language models, such as RoBERTa or RemBERT; or relying on linguistic features. Our best models for both languages were trained through Transfer Learning techniques, obtaining the 6th and 8th position in the English and Spanish versions of this subtask, respectively. Results obtained in the Spanish-version were close to the top-performing team.This research work is part of the R&D projects “CORTEX: Conscious Text Generation” (PID2021-123956OB-I00) and “TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP” (PID2021-122263OB-C22), both funded by MCIN/ AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”, and “CLEAR.TEXT:Enhancing the modernization public sector organizations by deploying Natural Language Processing to make their digital content CLEARER to those with cognitive disabilities” (TED2021-130707B-I00), funded by MCIN/AEI/10.13039/501100011033 and “European Union NextGenerationEU/PRTR”. Moreover, it has been also partially funded by the Generalitat Valenciana through the project “NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation with grant reference (CIPROM/2021/21)", and by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231)

Repositorio Institucional de la Universidad de Alicante